Abstract: Learning in the presence of adversarial perturbations is an important challenge for the present machine learning community. We consider a 'beyond Lp-norm' model where the adversary can make arbitrarily large changes to the features, say for an image they can change its brightness and/or contrast (modeled as arbitrary movements in a random low-dimension subspace). We discuss the power of two emerging tools in the theory of learning, namely abstention and data-driven algorithm design, in the fight against such adversaries.
Abstention incentivizes the learner to identify adversarially perturbed inputs. Our work provides a first formal abstention-based gap for the problem of inductive learning in a natural adversarial attack model, that is, we show that classifiers with the ability to abstain are provably more powerful than those that cannot. We introduce a parameterized nearest neighbor algorithm which uses thresholds to remove outliers and decide where to predict. Our classifier can be viewed as a step towards proof-carrying predictions, where predictions are accompanied by certificates of confidence.
Data-driven algorithm selection is a new beyond worst-case perspective on algorithm design and analysis. It models situations where we repeatedly solve instances of the same optimization problem (think of your favorite hard combinatorial problem - knapsack, SAT, subset sum, ...). This allows us to select algorithms from parameterized algorithm families, where we treat parameter optimization as a statistical learning problem over the class of algorithms. Finding the best parameter is hard in the worst case, but under mild perturbation invariance assumptions we show that we can learn the optimal data-specific threshold parameters and provably optimize the accuracy vs robustness tradeoff.
Joint work with: Nina Balcan, Avrim Blum, Hongyang Zhang.